This directory contains the R code to read and process the original emissions and mortality data to be used in the model. It contains two sub-directories:

Folder 1: CDC_HHS_Mortality -- for the mortality data from the Centers for Disease Control and Prevention (CDC) and the U.S. Department of Health and Human Services (HHS). 
Source: Centers for Disease Control and Prevention (2020) and United States Department of Health and Human Services, 2020.

Folder 2: NEI -- for the National Emissions Inventory (NEI) data from the Environmental Protection Agency.
Source: Environmental Protection Agency (2018a; 2018b; 2020a; 2020b). 

Full references provided at the end of this document.

Both the CDC_HHS_Mortality and the NEI folders 1 and 2 contains the following sub-directories (a, b, c, and d below). 

All inputs are provided for the CDC_HHS_Mortality code.

Executing the code for NEI data requires downloading the original NEI data. 
We use data from the past four NEIs (2008, 2011, 2014, and 2017) and a separate code is provided for each year. 
The files that need to be downloaded are described in the references section.

a) Processed_Data_For_Model
Description: Final model outputs

b) Code (R Scripts)
Description: R Scripts

c) Auxiliary_Data
Description: Auxiliary data needed for the code, in this case just a list of counties and county codes

d) Original_Data_Pre_Processing
Description: The original data as provided by the sources, which we only provide for CDC/HHS Mortality data. Original NEI data needs to be downloaded and links are provided in the references section at the end of this document

For CDC/HHS Mortality, we provide the original data pre-processed, from the CDC (for Mortality Counts) and HHS (for Population Counts). We provide four files:
i) CountyDeaths.csv
5-year Mortality Counts in U.S. Counties by age group, where mortality counts were >= 50 deaths. This dataset is 42,830 rows and 6 columns. 
The columns are: 
1: FIPS State + County Code
2: Age Group
3: Age Group Code
4: Deaths
5: Cause of Death
6: Year
Source: Centers for Disease Control and Prevention, 2020

ii) StateDeaths.csv
5-year Mortality Counts in U.S. States by age group. This dataset is 1,274 rows and 6 columns.
1: FIPS State Code
2: Age Group
3: Age Group Code
4: Deaths
5: Cause of Death
6: Year
Source: Centers for Disease Control and Prevention, 2020

iii) CountyPop.csv
Population in U.S. Counties by age group.  
The dataset is 161642 rows by 5 columns.
The columns are:
1: FIPS State + County Code
2: Age Group
3: Age Group Code
4: Population
5: Year
Source: U.S. Department of Health and Human Services (2020)

iv) StatePop.csv
Population in U.S. Counties by age group.  
The dataset is 2548 rows by 5 columns.
The columns are:
1: FIPS State Code
2: Age Group
3: Age Group Code
4: Population
5: Year
Source: U.S. Department of Health and Human Services (2020)

For NEI:
i) Emissions data from NEIs 2008, 2011, 2014, and 2017 have to be downloaded (see instructions at the end of the document)

Additional inputs required that we provide with the code:
ii) SCC_Conversion.csv
A conversion of source classification codes (SCC) from the SCCs used in NEI 2008 to the SCCs used in other NEIs
Source: U.S. Environmental Protection Agency, Population and Activity of On-Road Vehicles in MOVES 2014. (2016). https://nepis.epa.gov/Exe/ZyPDF.cgi?Dockey=P100O7PS.pdf (accessed 27 August 2020).
Note: Code conversion used can be found in Table 21-2, pp. 124-126.

iii) CARB_GHG_OnRoad.csv
Greenhouse-Gas (GHG) emissions data for the State of California from the California Air Resources Board.
Source: California Air Resources Board, Data from “GHG Inventory Data Archive: 2018 Edition: Years 2000-2017.” https://ww3.arb.ca.gov/cc/inventory/pubs/reports/2000_2016/ghg_inventory_by_sector_all_00-16.xlsx (accessed 27 August 2020).


-----
DESCRIPTION OF OUTPUTS (i.e. Processed Data for Model)
Outputs are the processed data to be used in the model.

1. For CDC/HHS Mortality data, one file is provided as output, available at CDC_HHS_Mortality/Processed_Data_For_Model/Mortality.RData 

The file Mortality.RData contains four datasets.
a) Deaths.AllCause.2008 
Death counts per county from all causes and ages, for year 2008
A data frame with 3,108 lines and 2 columns. The columns are:
1) FIPS State + County code
2) Death count estimates for 2008 (from all causes and ages)

b) Deaths.AllCause.2017
Same as a) but for year 2017

c) Deaths.GEMM.2008 
Death counts per county from non-accidental causes for each GEMM age group, for year 2008
A data frame with:
3,108 lines (one for each county), and 13 columns. The columns are:
1) FIPS State + County code
2 to 13) Death count estimates from non-accidental causes for 2008, for each five-year GEMM age group. The age groups are denoted by their mid-point, and the >80 years old is denoted by 85. In order, from columns 2 to 13, the five-year age groups are:
2) 25-29 years old
3) 30-34
4) 35-39
5) 40-44
6) 45-49
7) 50-54
8) 55-59
9) 60-64
10) 65-69
11) 70-74
12) 75-79
13) 80+

d) Deaths.GEMM.2017
Same as c) but for year 2017



2. For NEI data, four files are provided as outputs:
NEI/Processed_Data_For_Model/NEIxxxx.RData, where (xxxx denotes the NEI year -- 2008, 2011, 2014, or 2017)
Each .RData file contains the following datasets:

a) NEIxxxx.VTYPE.EF (array of 3,108 counties x 14 vehicle types x 11 columns)
Emission factors (in g/mile) for each of 13 vehicle types + for all vehicles
Each row (dimension 1) denotes a county
Each slice (dimension 3) denotes a vehicle type
The columns (dimension 2) are:
1: State & County FIPS code
2: Vehicle type
3: VMT (in miles)
4: Primary PM 2.5 emission factor (g/mile)
5: SO2 emission factor (g/mile)
6: NOx emission factor (g/mile)
7: NH3 emission factor (g/mile)
8: VOC emission factor (g/mile)
9: CO2 emission factor (g/mile)
10: CH4 emission factor (g/mile)
11: N2O emission factor (g/mile)

The slices (dimension 3) are EPA's Vehicle Types/Source Types as used in NEI 2017 (slices 1-13) and all vehicles combined (slice 14):
1: Motorcycle (VTYPE = 11)
2: Passenger Car (VTYPE = 21)
3: Passenger Truck (VTYPE = 31)
4: Light Commercial Truck (VTYPE = 32)
5: Intercity Bus (VTYPE = 41)
6: Transit Bus (VTYPE = 42)
7: School Bus (VTYPE = 43)
8: Refuse Truck (VTYPE = 51)
9: Single Unit Short-Haul Truck (VTYPE = 52)
10: Single Unit Long-Haul Truck (VTYPE = 53)
11: Motor Home (VTYPE = 54)
12: Combination Short-Haul Truck (VTYPE = 61)
13: Combination Long-Haul Truck (VTYPE = 62)
14: All vehicles combined (VTYPE = 99 in the dataset, although this is not used by EPA)

b) NEIxxxx.VTYPE.EMIS
This is similar to a), with the difference that total emissions (in short tons) are provided in columns 4-11

c) NEIxxxx.REF.EMIS
These are VOC refueling emissions, which are not included with any of the 13 vehicle types in files a) and b) but are included in the emissions of all vehicles in those files (slice 14).
Note that it is not provided for 2008 since that NEI does not list refueling emissions separately
These datasets are 3,108 x 5 data frames.
The columns are:
1: State & County FIPS code
2: Vehicle Type (which we use as 0 for refueling emissions)
3: VMT for all vehicles combined in each county
4: VOC emissions (in short tons)
5: VOC emission factors (in g/mile, where mile is the sum of VMT for all vehicles in each county) 

We indicate VTYPE = 0 for the refueling datasets.



-----
REFERENCES:

Source for CDC/HHS Data:
Mortality Counts:
[dataset] Centers for Disease Control and Prevention, 2020. Underlying Cause of Death 1999-2018 on CDC WONDER Online Database, released in 2020. Centers for Disease Control and Prevention, National Center for Health Statistics. Data are from the Multiple Cause of Death Files, 1999-2018, as compiled from data provided by the 57 vital statistics jurisdictions through the Vital Statistics Cooperative Program. http://wonder.cdc.gov/ucd-icd10.html (accessed 30 November 2020).

Population Counts:
[dataset] U.S. Department of Health and Human Services, 2020. United States Department of Health and Human Services (US DHHS), Centers for Disease Control and Prevention (CDC), National Center for Health Statistics (NCHS), Bridged-Race Population Estimates, United States July 1st resident population by state, county, age, sex, bridged-race, and Hispanic origin. Compiled from 1990-1999 bridged-race intercensal population estimates (released by NCHS on 7/26/2004); revised bridged-race 2000-2009 intercensal population estimates (released by NCHS on 10/26/2012); and bridged-race Vintage 2019 (2010-2019) postcensal population estimates (released by NCHS on 7/9/2020). Available on CDC WONDER Online Database. http://wonder.cdc.gov/bridged-race-v2019.html (accessed 30 November 2020).


-----

Sources for NEI data -- needs to be downloaded:

1. NEI 2008: 
[dataset] U.S. Environmental Protection Agency, 2018a. Data from “2008 National Emissions Inventory (NEI)”, 2008nei_v3.
Links for download:
1.1 Onroad emissions: 
Dataset file: '2008neiv3_onroad_byregions.zip'
Available at: https://gaftp.epa.gov/air/nei/2008/data_summaries/2008neiv3_onroad_byregions.zip
Accessed on April 30, 2021
Last modified according to EPA website: 2018-01-31 13:44 
File Size: 208 M
1.2. Supporting data:
Dataset file: '2008nei_supdata_4c.zip'
Available at: https://gaftp.epa.gov/air/nei/2008/doc/2008v3_supportingdata/2008nei_supdata_4c.zip
Accessed on April 30, 2021
Last modified according to EPA website: 2018-01-31 12:43
File Size: 163 M 

2.NEI 2011: 
[dataset] U.S. Environmental Protection Agency, 2018b. Data from “2011 National Emissions Inventory (NEI)”, 2011nei_v2. 
Links for download:
2.1 Onroad emissions: 
Dataset file: '2011neiv2_onroad_byregions.zip'
Available at: https://gaftp.epa.gov/air/nei/2011/data_summaries/2011v2/2011neiv2_onroad_byregions.zip
Accessed on April 30, 2021
Last modified according to EPA website: 2018-02-01 05:27
File Size: 44 M
2.2. Supporting data:
Dataset file: '2011neiv2_supdata_or_VMT.zip'
Available at: https://gaftp.epa.gov/air/nei/2011/doc/2011v2_supportingdata/onroad/2011neiv2_supdata_or_VMT.zip
Accessed on April 30, 2021
Last modified according to EPA website: 2018-02-21 08:17
File Size: 23 M 

3. NEI 2014: 
[dataset] U.S. Environmental Protection Agency, 2020a. Data from “2014 National Emissions Inventory (NEI)”, 2014nei_v2. 
Links for download:
3.1. Onroad emissions:
Dataset file: '2014neiv2_onroad_byregions.zip'
Available at: https://gaftp.epa.gov/air/nei/2014/data_summaries/2014v2/2014neiv2_onroad_byregions.zip
Accessed on April 30, 2021
Last modified according to EPA website: 2020-11-19 10:01
File Size: 52 M
3.2. Supporting data:
Dataset file: '2014v2_onroad_activity_final.zip'
Available at: https://gaftp.epa.gov/air/nei/2014/doc/2014v2_supportingdata/onroad/2014v2_onroad_activity_final.zip
Accessed on April 30, 2021
Last modified according to EPA website: 2018-05-15 08:58
File Size: 35 M 

4. NEI 2017: 
[dataset] U.S. Environmental Protection Agency, 2020b. Data from “2017 National Emissions Inventory (NEI)”, 2017nei_v1 (Apr 2020). 
Links for download:
4.1. Onroad emissions:
Dataset file: '2017neiApr_onroad_byregions.zip'
Available at: https://gaftp.epa.gov/air/nei/2017/data_summaries/2017v1/2017neiApr_onroad_byregions.zip
Accessed on April 30, 2021
Last modified according to EPA website: 2020-04-30 09:53
File Size: 48 M
4.2. Supporting data:
Dataset file: '2017NEI_onroad_activity_final.zip'
Available at: https://gaftp.epa.gov/air/nei/2017/doc/supporting_data/onroad/2017NEI_onroad_activity_final.zip
Accessed on April 30, 2021
Last modified according to EPA website: 2020-04-14 14:30
File Size: 22 M


